Markov network structure discovery using independence tests
نویسندگان
چکیده
We investigate efficient algorithms for learning the structure of a Markov network from data using the independence-based approach. Such algorithms conduct a series of conditional independence tests on data, successively restricting the set of possible structures until there is only a single structure consistent with the outcomes of the conditional independence tests executed (if possible). As Pearl has shown, the instances of the conditional independence relation in any domain are theoretically interdependent, made explicit in his well-known conditional independence axioms. The first couple of algorithms we discuss, GSMN and GSIMN, exploit Pearl’s independence axioms to reduce the number of tests required to learn a Markov network. This is useful in domains where independence tests are expensive, such as cases of very large data sets or distributed data. Subsequently, we explore how these axioms can be exploited to “correct” the outcome of unreliable statistical independence tests, such as in applications where little data is available. We show how the problem of incorrect tests can be mapped to inference in inconsistent knowledge bases, a problem studied extensively in the field of non-monotonic logic. We present an algorithm for inferring independence values based on a sub-class of nonmonotonic logics: the argumentation framework. Our results show the advantage of using our approach in the learning of structures, with improvements in the accuracy of learned networks of up to 20%. As an alternative to logic-based interdependence among independence tests, we also explore probabilistic interdependence. Our algorithm, called PFMN, takes a Bayesian particle filtering approach, using a population of Markov network structures to maintain the posterior probability distribution over them given the outcomes of the tests performed. The result is an approximate algorithm (due to the use of particle filtering) that is useful in domains where independence tests are expensive.
منابع مشابه
Efficient Markov Network Structure Discovery using Independence Tests
We present two algorithms for learning the structure of a Markov network from data: GSMN* and GSIMN. Both algorithms use statistical independence tests to infer the structure by successively constraining the set of structures consistent with the results of these tests. Until very recently, algorithms for structure learning were based on maximum likelihood estimation, which has been proved to be...
متن کاملEfficient Markov Network Discovery Using Particle Filters
In this paper we introduce an efficient independence-based algorithm for the induction of the Markov network structure of a domain from the outcomes of independence test conducted on data. Our algorithm utilizes a particle filter (sequential Monte Carlo) method to maintain a population of Markov network structures that represent the posterior probability distribution over structures, given the ...
متن کاملEfficient and Robust Independence-Based Markov Network Structure Discovery
In this paper we introduce a novel algorithm for the induction of the Markov network structure of a domain from the outcome of conditional independence tests on data. Such algorithms work by successively restricting the set of possible structures until there is only a single structure consistent with the conditional independence tests executed. Existing independence-based algorithms havewellkno...
متن کاملLearning Markov Network Structure using Few Independence Tests
In this paper we present the Dynamic Grow-Shrink Inference-based Markov network learning algorithm (abbreviated DGSIMN), which improves on GSIMN, the state-ofthe-art algorithm for learning the structure of the Markov network of a domain from independence tests on data. DGSIMN, like other independence-based algorithms, works by conducting a series of statistical conditional independence tests to...
متن کاملLearning Bayesian Network Structure using Markov Blanket in K2 Algorithm
A Bayesian network is a graphical model that represents a set of random variables and their causal relationship via a Directed Acyclic Graph (DAG). There are basically two methods used for learning Bayesian network: parameter-learning and structure-learning. One of the most effective structure-learning methods is K2 algorithm. Because the performance of the K2 algorithm depends on node...
متن کامل